Exploration Of Red Wine Quality
by
Will Everhard


This data is based off of a study done on wine quality. The original study was on both red and white wines but I will be using the red wine part for the sake of keeping the study smaller. Directly from the documentation: “The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent).”

“The dataset is related to red variants of the Portuguese”Vinho Verde" wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.)."

Description of the variables:

##  [1] "X"                    "fixed.acidity"        "volatile.acidity"    
##  [4] "citric.acid"          "residual.sugar"       "chlorides"           
##  [7] "free.sulfur.dioxide"  "total.sulfur.dioxide" "density"             
## [10] "pH"                   "sulphates"            "alcohol"             
## [13] "quality"

After looking at the variables, I noticed there is an extra variable named “X”. I didn’t see any documentation on this so I will keep that in my mind as I investigate.

Here, I further investigate the variable types…

## 'data.frame':    1599 obs. of  13 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...

There are 1599 observations (wines) and 13 variables in this study with all of them set to num except for the variables “X” and “quality”.

Since the point of the data article is focused on quality, I will single out quality first…

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.636   6.000   8.000

It appears that quality ranges from 3 to 8 even though the study says it goes from 1 to 10.

Now I will check out a summary of the rest of the data…

##        X          fixed.acidity   volatile.acidity  citric.acid   
##  Min.   :   1.0   Min.   : 4.60   Min.   :0.1200   Min.   :0.000  
##  1st Qu.: 400.5   1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090  
##  Median : 800.0   Median : 7.90   Median :0.5200   Median :0.260  
##  Mean   : 800.0   Mean   : 8.32   Mean   :0.5278   Mean   :0.271  
##  3rd Qu.:1199.5   3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420  
##  Max.   :1599.0   Max.   :15.90   Max.   :1.5800   Max.   :1.000  
##  residual.sugar     chlorides       free.sulfur.dioxide total.sulfur.dioxide
##  Min.   : 0.900   Min.   :0.01200   Min.   : 1.00       Min.   :  6.00      
##  1st Qu.: 1.900   1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.00      
##  Median : 2.200   Median :0.07900   Median :14.00       Median : 38.00      
##  Mean   : 2.539   Mean   :0.08747   Mean   :15.87       Mean   : 46.47      
##  3rd Qu.: 2.600   3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.00      
##  Max.   :15.500   Max.   :0.61100   Max.   :72.00       Max.   :289.00      
##     density             pH          sulphates         alcohol     
##  Min.   :0.9901   Min.   :2.740   Min.   :0.3300   Min.   : 8.40  
##  1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50  
##  Median :0.9968   Median :3.310   Median :0.6200   Median :10.20  
##  Mean   :0.9967   Mean   :3.311   Mean   :0.6581   Mean   :10.42  
##  3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10  
##  Max.   :1.0037   Max.   :4.010   Max.   :2.0000   Max.   :14.90  
##     quality     
##  Min.   :3.000  
##  1st Qu.:5.000  
##  Median :6.000  
##  Mean   :5.636  
##  3rd Qu.:6.000  
##  Max.   :8.000

I am interested in how variables correlate with quality.

Acids related thoughts: Acidity level is determined by pH levels. The lower the level of pH, the higher the level of acidity. For example, stomach acid is 1.0 and antacids have a pH of 10.5 The average pH levels for red wines are between 3.5 and 3.8.

Just like the density, the statistics observed here for pH levels relatively match the norm with the minimum at 2.740, the mean at 3.311, the median at 3.310, and the max at 4.010.

Density related thoughts: Density in wine is measured with a hydrometer. Water has the density of 1 and that is what is used to contrast the density of wine. The density of wine juice is higher than water because there are sugars and other things like pigmentations in it. The typical density of the must (the term used for the juice before the yeast is added) is generally between 1.080 and 1.090. This essentially means wine is 8-9% more dense than water.

Alcohol is approximately 0.8, or 20% less dense than water. As the yeast consumes the sugar in the wine, and converts it to alcohol, the must gradually becomes less dense. After fermentation is complete, the density of wine should be roughly at, or slightly less than 1.00, often 0.996.

Comparatively speaking, the statistics observed here relatively match the norm with the minimum at 0.9901, the mean at 0.9967, the median at 0.9968, and the max at 1.0037.

None of the wines are near the article’s stated level of sweet wines at a residual sugar level of 45 grams per liter so sweet wines can be ruled out of this exploration.

Sulfurs related thoughts: It is generally widely accepted amongst wine drinkers that an aged wine is associated with “better” wine. Sulfites in wine is naturally occurring and is also added to prevent microbial growth and oxidation of the wine but with too much sulfites, it can have an onion or egg yolk hint of smell to it. Sulfites dissipate over time, this may be a correlation to the age and quality of wine.

It has also been said in an article about sulfites in wine, that, “Natural Wines have an authenticity of taste that most modern wines have lost. They’re complex, unusual, surprising, joyful. They leap out of the glass with a vivacity that’s far too rare in today’s winemaking world. When you drink Natural Wine, you can taste its origin, its terroir, and all the subtleties that make it unique. That’s only possible in low-sulfite wines.”

I was not really able to find a quick enough explanation of how this data measured sulphates so I use this information to see how sulphates relate to higher quality wines.

Salty thoughts: As far as salinity goes, salt can accentuate aromatics from a terrior by the taster associating the smells with the taste of salt and sometimes a saltier wine is looked for in those situations. However, Roman Roth, winemaker at Wölffer Estate in Sagaponack, New York, says that a good wine should always have “a number of things competing for your attention. Is it acidity? Is it minerality? Is it tannins, or creamy yeast characters? Is it salinity? Nothing should stand out. They should all be in a harmonious balance, making the wine interesting and giving it finesse.”

I was not able to find a reliable source in a timely manner for a standard of when a wine becomes “too” salty. I will look for a general balance of chloride in relation to quality considering the article I quoted.

Univariate Plots Section

I’ll start with quality since it is the title of the data article.

There appears to be a normal distribution with a very small rating range. Considering most of the wines ranked within the qualities of 5 & 6, I will create 3 ranges from the data ratings as Low Quality, Average, and High Quality to observe the data more closely.

Univariate Analysis

There is a great deal of average quality red wines (the quality ratings of 5 and 6). There are a lot of low alcohol wines around 9%. The sulphates range seems pretty low going from 0.5 to 2.0 with most of the sulphate levels just above 0.5. pH and Density have a normal distribution with their averages being in the middle of the ranges. Total sulfur dioxide, free sulfur dioxide, chlorides, residual sugars, and fixed acids all seem to have the majorities on the lower end of their ranges. Citric acid and volatile acid have the most unique distributions showing almost bimodal and right skewed.

I am mostly interested in looking at the acids since they have the most variance. I will contrast with other variables against quality as well.

I turned quality into a factor and combined levels to get a larger pool of samples for high quality and low quality wines since the majority of the samples are within 5 and 6.

Bivariate Plots Section

High alcohol had the highest positive correlation I came across but it was weird in that it was lowest at the middle before spiking high in higher quality wines. I had to take a closer look by using the original quality separation of red wines. Funny enough the lowest mean is at 5 while it also holds the highest levels of alcohol recorded in the observations as outliers.

Sulphates have a steady positive correlation with quality.

pH has a slight negative correlation with quality.

Density seems to have little to no effect on quality though it is slightly lower in quality wines.

Total sulfur dioxide has no visible effect on quality.

Free sulfur dioxides have no effect on quality.

Chlorides have at best, minimal to no effect on quality.

Residual sugar seems to have no effect on quality.

Citric acid shows a clear positive correlation to quality.

Volatile acidity shows a clear negative correlation with quality.

Fixed acidity seems to have a positive correlation with quality.

This plot confuses me a bit. High pH means low acidity yet this plot suggests a positive correlation. Volatile does mean unstable though so I can make sense of the acidity dissipating.

This contrast makes perfect sense with pH and fixed acidity inversely correlated.

This plot agrees with acidity inversely correlating with pH levels.

Citric acid and sulphates show a positive correlation.

pH and sulphates show a slight negative correlation.

This plot makes sense of the pH vs sulphates as acidity is the opposite of pH there is and should be a slight positive correlation here.

This plot further agrees with earlier findings of volatile acidity acting more basic in measure than acidic.

Bivariate Analysis

Volatile acidity looks to be the most inversely correlated with quality..

Citric acid, fixed acidity, sulphates, and alcohol have the best positive correlations with quality.

High levels of alcohol seems to have the strongest correlation with high quality red wines.

Multivariate Plots Section

This graph surprises me in that it suggests a positive correlation between alcohol and pH. Though in the higher quality wines we see that pH caps off at 3.75 while low quality and average go higher in pH.

Overall we see a negative correlation between alcohol and fixed acidity but when looking at the different levels of quality, higher quality wines have higher alcohol and higher fixed acidity when compared to lower levels of quality. I do not see a strong relationship here and wine is generally acidic.

Here we see that alcohol goes higher with quality and volatile acidity goes lower with quality. They have an inverse relationship towards each other when determining quality.

Here we see a positive correlation with citric acid and alcohol in regards to quality. I notice in high quality wines that citric acid tops off around 0.75. It does show some of the high quality wines having almost no citric acid at all. I speculate that citric acid is more of a popular personal taste of certain varietals than a 100% clear decisive determining factor of quality wine.

This graph shows a slight positive correlation between alcohol and sulphates. I’m noticing here that sulphates in the higher quality cap off lower than the other levels of quality.

I see a positive correlation between citric acid and sulphates but I also see that the range of sulphates shrinks as the quality rises.

I see a negative correlation between alcohol and density but the combination of alcohol and density rises with quality.

This graph clearly shows high alcohol and low residual sugars with high quality red wines.

Density has a varying range amongst higher quality wines while residual sugar remains low.

The higher the sulphates, the lower the volatile acidity. This makes sense because sulphates help prevent oxidation. However, sulphates are higher on average in high quality wines but the range of sulphates shrinks with higher quality and is still on the lower side overall.

pH seems to have little to no effect on quality while sulphates remain the same in high quality wines with a small range averaging on lower levels.

Sulphates contrasted with fixed acidity seems to have little to no effect on quality.

Multivariate Analysis

Alcohol seemed overall to strengthen everything with some help from citric acid and sulphates. However, alcohol and citric acid capped off at a medium to high level where sulphates in the highest quality wines were low even though sulphates helped to raise the quality.

I was surprised to see that density of wine in high quality wines had a positive effect since residual sugars had a negative effect and high sugar is usually associated with high density while alcohol makes wine less dense. I wonder if concentration of grape skin pigments make for more dense more aromatic wine while residual sugar stays low.

Final Plots and Summary

Plot One

Here we see alcohol greatly raises the quality of wines with the help of citric acid. Citric acid caps off lower in the high quality wines compared to the other levels of wine quality but the numbers cluster above mid level for citric acid. That being said, citric acid doesn’t make great wine by itself but it certainly seems to be a popular variable while it is within a certain range when deciding if it is a high quality wine

Plot Two

Here we see sulphates playing a role similar to citric acid in that it isn’t maximized in high quality wines but just the right amount, within a small range, elevates the wine quality along with alcohol.

Plot Three

Here we see citric acid and sulphates positively effecting the quality of wine but within a small range compared to average and low quality.

Reflection

Overall, I found that alcohol plays a great deal in raising red wine quality along with a little help from citric acid, density, and sulphates. Obviously, alcohol alone does not determine a high quality wine. Some things to consider from looking at these results are that alcohol is volatile and therefor can fume aromatics more potently. Low levels of sugar with high quality wines yet density increasing with alcohol and quality suggests other elements, like, perhaps the concentration/thickness of the grape skins that give it high tannins, deep color, and aromatics.

Another assumption I deduce from seeing low levels of sulphates in high quality wines but the need for them to be present in high quality wines is, we all know aged wines are generally considered “higher quality” wine, so I suggest the sulphates did their job well preserving the wine and as the wine aged the sulphates dissipated. Wines that never had enough sulphates in them probably oxidized early on and became less favored. Wines that had a lot of sulphates probably had too much added or where not aged well enough for the sulphates to dissipate. It seems the quote at the beginning of my analysis rings true that high quality wines are well balanced.

I think the most difficult part of this analysis was to focus in on certain points and not over thinking it. Data can be shown in many ways and still be made sense of so it was difficult having such a variety to choose from. For further analysis of red wine quality, new variables like grape varietal, terroir, harvest season, statistics on the weather of each year the grapes where harvested, etc. could make a significant impact on determining what variables makes wine quality high.